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Abstract 

In Part I of this report, we introduced a Byzantine fault-tolerant distributed optimization prob¬ 
lem whose goal is to optimize a sum of convex (cost) functions with real-valued scalar input/ouput. 
In particular, the goal is to optimize a global cost function ''^here Af is the set of 

non-faulty agents, and hi{x) is agent i’s local cost function, which is initially known only to agent 
i. In general, when some of the agents may be Byzantine faulty, the above goal is unachievable. 
Therefore, in Part I, we studied a weaker version of the problem whose goal is to generate an 
output that is an optimum of a function formed as a convex combination of local cost functions 
of the non-faulty agents. We showed that the maximum achievable number of weights (cti’s) that 
are bounded away from 0 is |AA| — / , where / is the upper bound on the number of Byzantine agents. 

In this second part, we introduce a condition-based variant of the original problem over arbitrary 
directed graphs. Specifically, for a given collection of k input functions hi{x),..., hk{x), we consider 
the scenario when the local cost function stored at agent j, denoted by gj{x), is formed as a convex 
combination of the k input functions hi{x),... ,hk (x). The goal of this condition-based problem is 
to generate an output that is an optimum of ^ Yli=i hi{x). Depending on the availability of side 
information at each agent, two slightly different variants are considered. We show that for a given 
graph, the problem can indeed be solved despite the presence of faulty agents. In particular, even in 
the absence of side information at each agent, when adequate redundancy is available in the optima 
of input functions, a distributed algorithm is proposed in which each agent carries minimal state 
across iterations. 

Keywords: Distributed optimization; Byzantine faults; incomplete networks; fault-tolerant com¬ 
puting 

1 System Model and Problem Formulation 

The system under consideration is synchronous, and consists of n agents connected by an arbitrary 
directed communication network G{V, £), where V = {1,..., n} is the set of n agents, and £ is the 
set of directed edges between the agents in V. Up to / of the n agents may be Byzantine faulty. 

* This research is supported in part by National Science Foundation awards NSF 1329681 and 1421918. Any opinions, 
findings, and conclusions or recommendations expressed here are those of the authors and do not necessarily reflect 
the views of the funding agencies or the U.S. government. 
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Let T denote the set of faulty agents in a given execution. Agent i can reliably transmit messages 
to agent j if and only if the directed edge (i,j) is in E. Each agent can send messages to itself as 
well, however, for convenience, we exclude self-loops from set £. That is, {i,i) 0 E for i € V. With a 
slight abuse of terminology, we will use the terms edge and link interchangeably, and use the terms 
nodes and agents interchangeably in our presentation. 

For each agent i, let N~ be the set of agents from which i has incoming edges. That is, 
N~ = {j I (j,i) € E}. Similarly, define as the set of agents to which agent i has outgo¬ 
ing edges. That is, = {j \ ihj) ^ E}. Since we exclude self-loops from E, i ^ N~ and i ^ . 
However, we note again that each agent can indeed send messages to itself. Agent j is said to be 
an incoming neighbor of agent i, if j G N~. Similarly, j is said to be an outgoing neighbor of agent 
i, if j G 


We say that a function /i : M —>■ M is admissible if (i) h[-) is convex and L-Lipschitz continuous, 
and (ii) the set argmin h{x) containing the optima of h{-) is non-empty and compact (i.e., bounded 
and closed). Given k admissible input functions hi{x),... ,hk{x), each agent i G V is initially 
provided with a local cost function gi{-) of the form 


gi{x) = Auhi{x) -\- A 2 ih 2 {x) -h ... -F Akihk{x), 


where Aji > 0 and Aji > 0 and Aji = 1 for all i G V and all j = 1,..., fc. Compactly, 

we have g,{x) = h(x)A, where h(x) = [hi{x),h 2 {x),... ,hk{x)], g(a:) = [gi{x), g 2 {x),..., gn{x)] 
and A G Our problem formulation is motivated by the work on condition-based consensus 

[15,6,10], where the inputs of the agents are restricted to be within some acceptable set. 

Each agent i maintains state Xj, with Xi{t) denoting the local estimate of the optimal x, com¬ 
puted by node i at the end of the t-th iteration of the algorithm, with Xi(0) denoting its initial 
local estimate. At the start of the t-th iteration (t > 0), the local estimate of agent i is Xi{t — 1). 
The algorithms of interest will require each agent i to perform the following three steps in iteration 
t, where t > 0. Note that the faulty agents may deviate from this specification. Since each hj{-) is 
convex and L-Lipschitz continuous, and ^ji — 1) follows that each gi{-) is also convex and 

L-Lipschitz continuous. Note that the formulation allows n < k as well as n > k. The matrix A is 
termed as a job assignment matrix. The goal here is to develop algorithms that output Xj = x at 
each non-faulty agent i such that 


X G argmin h{x) 


-Y^hjix). 


( 1 ) 


That is, we are interested in developing algorithms in which the local estimate of each non-faulty 
agent will eventually reach consensus, and the consensus value is an optimum of function h{-). 

Let Xj = argmin hj(x) for all j = l,...,k, and let X = argmin h{x). For ease of future 
reference, we refer to the above optimization problem 1 as Problem (1). Problem 1 is said to be 
solvable if there exists an algorithm that outputs x G argmin h{x) at each non-faulty agent i for any 
collection of k admissible functions. Problem 1 can be further formulated differently depending on 
whether each non-faulty agent i knows the assignment matrix A or not. We refer to the formulation 
where the agents know matrix A as condition-based Byzantine multi-agent optimization with side 
information; otherwise the problem is called condition-based Byzantine multi-agent optimization 
without side information. 


Our formulation is more general than the common formulation adopted in [8,17,18,21,24,25], in 
which / = 0 and the assignment matrix A = 1^ (identity matrix) is considered. Despite the elegance 
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of the algorithms proposed in [8,17,18,21,24,25], none of these algorithms work in the presence of 
Byzantine agents when / > 1 and A = I^. Informally speaking, this is because under g(x) = 
h(x)Ifc = h(x) assignment, the information about the input function hi{x) is exclusively known to 
agent i in the system. If agent i is faulty and misbehaves, or crashes at the beginning of an execution, 
then the information about hi{x) is not accessible to the non-faulty agents. When hj{x) and 

have common optima, there does not exist a correct algorithm. A stronger 
impossibility result is presented next, which is proved in Part I of our work [23]. 

Theorem 1. [23] Problem 1 is not solvable when / > 1 and A = I^. 

In contrast, function redundancy can be added to the system by applying a properly chosen job 
assignment matrix A to h(x). For example, suppose k = 2, f = 1 and the optimal sets of functions 
hi{x) and /i 2 (x) are [—1,0] and [0,1], respectively. Let g(x) = h(x)G, where G is a generator 
matrix of a repetition code with d = 2/ + 1 = 3. Informally speaking, by applying linear code G on 
input functions h(x), i.e., g(x) = h(x)G, the Byzantine agents’ ability in hiding information about 
input functions can be weakened. This observation and Theorem 1 together justify our problem 
formulation. 

Contributions: We introduce a condition-based approach to Byzantine multi-agent optimization 
problem. Two slightly different variants are considered: condition-based Byzantine multi-agent op¬ 
timization with side information and condition-based Byzantine multi-agent optimization without 
side information. For the former, when side information is available at each agent, a decoding-based 
algorithm is proposed, assuming that each input function is differentiable. This algorithm combines 
the gradient method with the decoding procedure introduced in [4] (namely matrix A). With such 
a decoding subroutine, our algorithm essentially performs the gradient method, where gradient 
computation is performed distributedly over the multi-agent system. When side information is not 
available at each agent, we propose a simple consensus-based algorithm in which each agent car¬ 
ries minimal state across iterations. This consensus-based algorithm solves Problem 1 under the 
additional assumption over input functions that all input functions share at least one common 
optimum. 

Organization: The rest of the report is organized as follows. Related work is summarized in 
Section 2. Condition-based Byzantine multi-agent optimization with side information is analyzed 
in Section 3, where each agent knows the assignment matrix A. Section 4 is devoted to the case 
when each agent does not know A. Section 5 concludes the report. 

2 Related Work 

Fault-tolerant consensus [19] is closely related to the optimization problem considered in this report. 
There is a significant body of work on fault-tolerant consensus, including [7,6,14,9,12,27,10]. Two 
variants that are most relevant to the algorithms in this report are iterative approximate Byzan¬ 
tine consensus [9,12,27] and condition-based consensus [15,6,10]. Iterative approximate consensus 
requires that the agents agree with each other only approximately, using local communication and 
maintaining minimal state across iterations. Condition-based consensus [15] restricts the inputs of 
the agents to be within some acceptable set. [6] showed that if a condition (the set of allowable 
system inputs) is /-acceptable, then consensus can be achieved in the presence of up to / crash 
failures over complete graphs. A connection between asynchronous consensus and error-correcting 
codes (ECC) was established in [10], observing that crash failures and Byzantine failures correspond 
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to erasures and substitution errors, respectively, in ECCs. Condition-based approach can also be 
used in synchronous system to speed up the agreement [16,15]. 

Convex optimization, including distributed convex optimization, also has a long history [2]. 
Primal and dual decomposition methods that lend themselves naturally to a distributed paradigm 
are well-known [3]. There has been significant research on a variant of distributed optimization 
problem [8,17,18,21,24,25], in which the global objective h{x) is a summation of n convex func¬ 
tions, i.e, h{x) = with function hj{x) being known to the j-th agent. The need for 

robustness for distributed optimization problems has received some attentions recently [8,17,21]. 
In particular, Ram et al. [21] studied the scenario when each component function is known par¬ 
tially (with stochastic errors) to an agent, Duchi et al. [8] and Nedic et al. [17] investigated the 
impact of random communication link failures and time-varying communication topology. Duchi 
et al. [8] assumed that each realizable link failure pattern considered in [8] is assumed to admit a 
doubly-stochastic matrix which governs the evolution dynamics of local estimates of the optimum. 
The doubly-stochastic requirement is relaxed in [17], using the push-sum technique used in [24]. In 
contrast, we consider the system in which up to / agents may be Byzantine, i.e., up to / agents 
may be adversarial and try to mislead the system to function improperly. We are not aware of the 
existence of results obtained in this report. 

In other related work, significant attempts have been made to solve the problem of distributed 
hypothesis testing in the presence of Byzantine attacks [11,29,13], where Byzantine sensors may 
transmit fictitious observations aimed at confusing the decision maker to arrive at a judgment that 
is in contrast with the true underlying distribution. Consensus based variant of distributed event 
detection, where a centralized data fusion center does not exist, is considered in [11]. In contrast, 
in this paper, we focus on the Byzantine attacks on the multi-agent optimization problem. 

3 Condition-based Byzantine mnlti-agent optimization with side information 

In this section we consider condition-based Byzantine multi-agent optimization with side informa¬ 
tion, where each agent knows the assignment matrix A. Let {a(t)}jTg be a sequence of step sizes. 
A simple decoding-based algorithm, Algorithm 1, formally presented below, works in an iterative 
fashion. Recall that Xj(0) is the initial state of local estimate for each non-faulty agent i a V — J-, 
and G(y, £) is the underlying communication graph. Without loss of generality, we assume that 
a:j(0) = xq foT i € V — T and some arbitrary but hxed xq G M. Otherwise, we can add an addi¬ 
tional initialization step to guarantee identical “initial state” using an arbitrary exact consensus 
algorithm. Let Xi{t) be the local estimate of an optimum in A, computed by node i at the end of 
the t-th iteration of the algorithm. At the start of the t-th iteration (t > 0), the local estimate of 
agent i is Xi{t — 1). 

For Algorithm 1 to work, we assume that each input function hj(-) is differentiable. Conse¬ 
quently, the local objective gi{-) is also differentiable for each i G V. Let A G be a matrix 

that can corrects up to / arbitrary entry-wise errors in [4]. At iteration t, each non-faulty agent i 
computes the gradient of gi{t) at Xi{t — 1). Let d{t) be the /c-dimensional vector of the gradients of 
the k input functions at Xi{t — 1), where i G V — A. For the j-th entry in d(t), i.e, dj(t), it holds 
that dj(t) = h'j{xi{t — 1)). Later we will show that Xi{t — 1) = Xj{t — 1) for all i,j G V — A. Thus 
d{t) is well-defined. In addition, we assume the structure of the underlying graph G{V,£) admits 
Byzantine broadcast. For instance, when G{V,£) is undirected, for a correct Byzantine broadcast 
algorithm to exist, node connectivity of G{V,S) is at least 2/ -|- 1. 


Algorithm 1: 
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Steps to be performed by agent i € V in iteration t >0. 

Initialization: Xj(0) ^ xq. 

1. Transmit step: Compute g'i{xi{t — 1)), the gradient of gi{-) at Xi{t — 1), and perform Byzantine 
broadcast of g'i{xi{t — 1)) to all agents. 

2. Receive step: Receive gradients from all other agents. Let y*(t) be a n-dimensional vector 
of received gradients, with y^j{t) being the value received from agent j. If j £ V — J-, then 
y)it) = g'j{xj{t-l)). 

3. Gradient Decoding step: Perform the decoding procedure in [4] to recover 

d(t) = [h'^{xi{t - 1)), • • • , h'k{xi{t - 1))]'^. 

4. Update step: Update its local estimate as follows. 

k 

Xi{t) = Xi{t - 1) - a{t - 1) ^ h'j{xi{t - 1)). (2) 

i=i 


At iteration t = 1, each non-faulty agent i computes (7'(xi(0))-the gradient of gi{-) at the current 
estimate Xi(0) = xq, and performs Byzantine broadcast of g'^ixo). Note that a faulty agent p, instead 
of 5p(xo), may perform Byzantine broadcast of some arbitrary value to other agents. Recall that 
y*(l) € M"' is a n-dimensional real vector, with yj (l) be the value received from agent j at iteration 
1. Since g(-) = h(-)A, then we can write y*(l) as y*(l) = d(l)A + e*(l), where e*(l) corresponds 
to the errors induced by the faulty agents. Let p be a nonzero entry in e*(l), it should be noted 
that 6^(1) can be arbitrarily away from 0. Since messages/values are transmitted via Byzantine 
broadcast, it holds that e®(l) = e® (1) for all i,i' £ V — D. Consequently, we have y®(l) = y® (1) 
for all i,i' £ V — D. For each i £ V — D, d(l) can be recovered using the decoding procedure in 
[4]. By the updating function (2), we know Xj(l) = Xj{l) for all i, j £ V — D. Inductively, it can be 
shown that Xi{t) = Xj{t) for all i, j £ V — T, and for all t > 0. Thus, d{t) is well-defined for each 
t > 0. The remaining correctness proof of Algorithm 1 follows directly from the standard gradient 
method convergence analysis for convex objective. 

Due to the use of Byzantine broadcast, the communication load in Algorithm 1 is high. The 
communication cost can be reduced by using a matrix A that has stronger error-correction ability. 
In general, there is some tradeoff among the communication cost, the graph structure and the 
error-correcting capability of A. Our main focus of this paper is the case when no side-information 
is available at each agent, thus we do not pursue this tradeoff further. 

4 Condition-based Byzantine multi-agent optimization without side 
information 

In this section, we consider the scenario when side information about the assignment matrix A is not 
known to each agent. We will classify the collection of input functions into three classes depending 
on the level of redundancy in the input function solutions. For functions with adequate redundancy 
in their optima, a simple consensus-based algorithm, named Algorithm 2, is proposed. Although 
Algorithm 2, at least in its current form, only works for a restricted class of input functions, it is 
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more efficient in terms of both memory and local computation, compared to Algorithm 1. We leave 
the adaptation of Algorithm 2 to the general input functions as future work. 

The job assignment matrices used in this section are characterized by sparsity parameter-a new 
property (introduced in this report) over matrices. 

4.1 Classification of input functions collections 

Recall that protective function redundancy is added to the system by applying a proper matrix 
A to h(-), i.e., g(-) = h(-)A. In Algorithm 1 sufficient redundancy is added to the system such 
that Algorithm 1 works for any collection of input functions. However, for some collection of input 
functions, such function redundancy may not be necessary. Consider the case when all k input 
functions are strictly convex and have the same optimum, i.e., Xi = {x*} for some x* and for 
all f = 1,..., fe. In addition, the agents know that Xi = Xj and \Xi\ = 1 for all i, j G V. It can 
be checked that h{x) = ^Yl^=ihj{x) is also strictly convex and X = {x*}. Even if there is no 
redundant agents in the system and no redundancy added when applying A, i.e., A = 1^, Problem 
1 can be solved trivially by requiring each non-faulty agent to minimize its own local objective 
hi{x) individually without exchanging any information with other agents. 

Informally speaking, as suggested by the above example, the optimal sets of the given input 
functions may themselves have redundancy. For ease of further reference, we term this redundancy 
as solution redundancy. Closer examination reveals that the collections of input functions can be 
categorized into three classes according to solution redundancy. 

Case 1: The k input functions are strictly convex and Xi = {x*} for alH = 1,..., A:, and the agents 
know that Xi = Xj and |Ai| = 1 for all i,j G V; 

Case 2: The k input functions share at least one common optimum, i.e., ^ 0, and the 

agents know that ^ 0; 

Case 3: The k input functions share no optima, i.e., = 0. 

If the collection of k input functions belongs to Case 1 or Case 2, we refer to this scenario as 
solution-redundant functions; similarly, we refer to the collection of k functions that falls within 
Case 3 as solution-independent functions. When the given collection of input functions fits Case 2 
or Case 3 (but not Case 1), information exchange among agents is in general required in order to 
achieve asymptotic consensus over local estimates of non-faulty agents. 

In this section, we are particularly interested in the family of algorithms of the following struc¬ 
ture. 

4.2 Algorithm Structure 

Recall that each agent i maintains state Xi, with Xi{t) denoting the local estimate of an optimum 
in X, computed by node i at the end of the t-th iteration of the algorithm, with Xi(0) denoting 
its initial local estimate. At the start of the t-th iteration (t > 0), the local estimate of agent i is 
Xi{t — 1). The algorithms of interest will require each agent i to perform the following three steps 
in iteration t, where t > 0. Note that the faulty agents may deviate from this specification. 

1. Transmit step: Transmit message mi{t) on all outgoing edges (to agents in AG^). 

2. Receive step: Receive messages on all incoming edges (from agents in N~). Denote by rj(f) the 
vector of messages received from its neighbors. 
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3. Update step: Agent i updates its local estimate using a transition function Zi, 


Xi{t) = Zi {ri{t),Xi{t - l),gii -)), 


( 3 ) 


where Z, is a part of the specification of the algorithm. 

The evolution of local estimate at agent i is governed by the update function defined in (3). Note 
that Xi{t) only depends on local objective Xi{t — 1) and rj(t)-the messages collected by agent 
i in the receive step of iteration t. No other information collected in any of the previous iteration 
will affect the update step in iteration t. Intuitively speaking, non-faulty agent i is assumed to have 
no memory across iterations except Xj. Note that the information available at each non-faulty node 
i € V — T" is the local estimate Xi{t — 1) and the local objective gi{-). Thus, the message mi{t) is a 
function of Xi{t — 1) and gi{-) only, i.e.. 


nii = Fi{xi{t - l),gi{-)). 


An algorithm is said to be correct (1) if limi_j.oo — Xj{t)\ = 0 and limt^ooXj(t) € 

X, for all initial states Xj{0) and for all i,j G V — T", and (2) if there exists a finite to such that 
Xi{to) = Xj{to) and Xi{to) G X for all i,j G V—F, then Xi{t) = Xi{to) for all i G V—F and for all t > 
to- 

Case 1 above is a special form of Case 2. For Case 1, where /ij(x)’s are strictly convex and 
have the same optimum, the problem can be solved trivially. However, for Case 2 in general, 
the redundancy that is necessary may depend on the underlying graph structure. Henceforth, we 
consider the scenario when the input functions falls in Case 2. Note that Theorem 1 still holds 
when restricting to Case 2 input functions. Next, we introduce the notion of sparsity parameter 
of a job assignment matrix, and characterize the tradeoff between the sparsity parameter and the 
necessary and sufficient condition, for a correct algorithm to exist. 

Definition 1. Given a job assignment matrix A, the sparsity parameter of A, denoted by sp{A), 
is the smallest integer such that the sum vector of any sp{A) columns of A is component-wise 
positive, i.e., every coordinate of the sum vector is positive. In particular, if the sum vector of all 
columns of A is not component-wise positive, then sp{A) = n -|- 1 by convention. 

Recall that A > 0 is a nonnegative matrix, sp{A) = n -\- 1 implies that there exists a row in A 
that contains only zeros. The following lemma presents a lower bound on the number of nonzero 
elements in a row of A, given that sp{A) = k'. 

Lemma 1. Given an assignment matrix A, its sparsity parameter sp{A) = k' if and only if there 
are at most k' — 1 zero entries in each row of A and there exists one row that contains exactly k' — 1 
zero entries. 

Lemma 1 is proved in Appendix B. 

The sparsest assignment matrix A with sp{A) = k' can be constructed by choosing arbitrary 
k' — 1 entries in each row to be zero. By the proof of Lemma 1, it can be checked that the sparsity 
parameter of the obtained matrix A is k'. In addition, the total number of non-zero entries in A is 
{n — k' + 1) k. 


4.3 Terminology of Consensus 

Our condition is based on characterizing a special of subgraphs of G(V,£’), termed by reduced 
graph [27], formally defined below. 

Definition 2. [27] For a given graph G{V,E), a reduced graph % is a subgraph ofG{V,£) obtained 
by (i) removing all the faulty agents from V along with their edges; (ii) removing any additional up 
to f incoming edges at each non-faulty agent. 

Let us denote the collection of all the reduced graphs for a given G{y,£) by Rj^. Thus, V — is 
the set of agents in each element in Rj:. Let r = It is easy to see that r depends on T as well 
as the underlying network G'(V,£’), and it is finite. 

Definition 3. A source component ^ S of a given graph G{V,£) is the collection of agents each of 
which has a directed path to every other agent in G{V,£). 

It can be easily checked that if the source component S, if any, is a strongly connected component 
in G{V,£). In addition, a graph contains at most one source component. 

4.4 Necessary Condition 

We now present a necessary condition on the underlying communication graph G{y,£) for solving 
Problem 1. Our necessary condition is based on characterizing the connectivity of each reduced 
graph of G(V, £). 

Theorem 2. Given a graph G{y,£), if there exists a correct algorithm that can solve Problem 1 
when the agents do not have knowledge of the matrix, under any assignment matrix A for any k 
solution-redundant input functions, then a source component must exist containing at least max{/ + 
l,sp(A)} nodes. 

The proof of Theorem 2 can be found in Appendix B. 

For future reference, we term the necessary condition in Theorem 2 as Condition 1. Condition 
1 also implies a lower bound on the number of agents needed, stated below. 

Corollary 1. For a given graph G{V,£), if Condition 1 is true, then n > max{sp(A)+ 2/, 3/ +1}. 

It can be shown that this lower bound is indeed tight. For instance, the complete graph of size 
sp{A) + 2/, denoted by It can be easily proved by contradiction that A 5 p(A)+ 2 / satisfies 

Condition 1. The proof of Corollary 1 is presented in Appendix B. 

4.5 Sufficiency of Condition 1 

Let {a(t)}^g be a sequence of stepsizes such that a{t) < a{t + 1) for all t > 0, ~ 

< cc- We show that Condition 1 is also sufficient. Let cf = jj^j. Thus 4> < f. Without 
loss of generality, let us assume that the non-faulty agents are indexed as 1 to n — </>. Recall that 
the system is synchronous. If a non-faulty agent does not receive an expected message from an 
incoming neighbor (in the Receive step below), then that message is assumed to have some default 
value. With the exception of the update step (4) below, the algorithm is similar to the consensus 
algorithms in [27,26,18]. 


^ The definition of a source is different from [28]. 
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Algorithm 2 

Steps to be performed by agent i € V — in the t-th iteration: 

1. Transmit step: Transmit current state Xi{t — 1) on all outgoing edges. 

2. Receive step: Receive values on all incoming edges. These values form multiset^ of size 

3. Update step: Sort the values in in an increasing order, and eliminate the smallest / values, 
and the largest / values (breaking ties arbitrarily). Let N*(t) denote the identifiers of agents 
from whom the remaining \N~\ — 2/ values were received, and let Wj denote the value received 
from agent j € N*{t). For convenience, define Wi = Xi{t — 1). ^ 

Update its state as follows. 


Xi{t) = aiWj - a{t - 1) di{t - 1), (4) 

je{i}uN*it) 

where a* = and di{t — 1) is a gradient of agent Fs objective function gi{-) at x = 

Xi{t - 1). 


Recall that i ^ Rl*{t) because (z,i) 0 8. The “weight” of each term on the right-hand side of 
(4) is Uj, and these weights add to 1. Observe that 0 < a* < 1. Let x G be a real vector of 

dimension n — cj), with Xi being the local estimate of agent i,\/ i G V — iF. Thus, x(t) is a vector of 
the local estimates of non-faulty agents at iteration t. 

Since G{V,S) satisfies Condition 1, as shown in [26], the updates of x G in each iteration 

can be written compactly in a matrix form. 

x(t -I-1) = M(t)x(t) — a(t)d(t). (5) 

The construction of lV[(t) and relevant properties are given in [26] and are also presented in Ap¬ 
pendix C for completeness. Let R G Rjr be a reduced graph of the given graph G{V, 8) with H as 
adjacency matrix. It is shown that in every iteration t, and for every M(t), there exists a reduced 
graph 'H{t) G Rjr with adjacency matrix H(t) such that 

M{t) > /3H(t), (6) 

where 0 < /? < 1 is a constant. The definition of /3 can be found in [26]. Equation (5) can be further 
expanded out as 


t+i 

x(t + l) = $(t,0)x(0)- E a(r — r)d(r — 1), (7) 

r=l 

where $(t, r) = M(t)M(t — 1)... M(r) and by convention $(t, t) = M(t) and $ {t,t + l)=ln-^, 
the identity matrix. Note that $(t, r) is a backward product (i.e., therein index decrease from left 
to right in the product). 

^ In a multiset, multiple instances of of an element is allowed. For instance, {1, 1, 2} is a multiset. 

® Observe that if j G {i} U N* (t) is non-faulty, then wj = Xj{t — 1). 







10 


Convergence of the Transition Matrices $(f, r) It can be seen from (7) that the evolution 
of estimates of non-faulty agents x(t) is determined by the backward product Thus, we 

first characterize the evolutional properties and limiting behaviors of the backward product r), 
assuming that the given G{y,£) satisfies Condition 1. 

Let k' = sp{A). The following lemma describes the structural property of r) for sufficient 
large t. For a given r, Lemma 2 states that all non-faulty agents will be influenced by at least 
max{/c',/ + 1} common non-faulty agents, and this set of influencing agents may depend on r. 
Proof of Lemma 2 can be found in Appendix D. 

Lemma 2. There are at least msix{sp{A), f + 1} columns in ^{r + ly — l,r) that are lower bounded 
by jd'^l component-wise for all r, where 1 € is an all one column vector of dimension n — cj). 

Using coefficients of ergodicity theorem, it is showed in [26] that if the given graph G(V, £) satisfies 
Condition 1, then $(t, r) is weak-ergodic. Moreover, because weak-ergodicity is equivalent to strong- 
ergodicity for backward product of stochastic matrices [5], as t —>■ oo the limit of $(t, r) exists 

lim $(t, r) = l7r(r), (8) 

t>r, >-oo 

where 7r(r) G is a row stochastic vector (may depend on r). It is shown, using ergodic 

coefficients, in [1] that the rate of the convergence in (8) is exponential, as formally stated in 
Theorem 3. Recall that r = \Rt\, n — (p is the total number of non-faulty agents, and 0 < /3 < 1 is 
a constant for which (6) holds. 

Theorem 3. [1] Let n = r(n — (p) and 7 = 1 — /3^. For any sequence ^{t,r), 

\^ij{t,r) - Trj{r)\ ( 9 ) 


for all t > r. 

Our next lemma is an immediate consequence of Lemma 2 and the convergence of $(t, r), stated 
in (8). 

Lemma 3. For any fixed r, there exists a subset Tr F V — J- such that \Ir\ > max{sp(A), / -|- 1} 
and for each i £ Ir, 


vrj(r) > Z?"". 

The proof of Lemma 3 can be found in Appendix D. 


Convergence Analysis of Algorithm 2 Here, we study the convergence behavior of Algorithm 
2. The structure of our convergence proof is rather standard, which is also adopted in [8,18,21,24,25]. 
We have shown that the evolution dynamics of x(t) is captured by (5) and (7). Suppose that all 
agents, both non-faulty agents and faulty agents cease computing di{t) after some time i, i.e., after 
t subgradient is replaced by 0. 

Let {x(t)} be the sequences of local estimates generated by the non-faulty agents in this case. 
From (7) we get 


x(t) = x(t) 



11 


for all t < t. From (5) and (7), we have for all s > 0, it holds that 

t 

x(f+* + l) = ^((,0)x(0)-j; a{r — l)$(t + s, r)d(r — 1). (10) 

Note that the summation in RHS of (10) is over t terms since all agents cease computing dj{t) 
starting from iteration t. As s —>■ oo, we have 


lim x(t + s + 1) = lim $(t, O)x(O) — a{r — l)$(t + s, r)d(r — 1) 

s—>-co s^oo 

r=l 

= lim $(t,0)x(0) — I a{r — 1) lim $(t + s, r)d{r — 1) 

.<?—\ ^ ^ ' .s—>oo 


\r=l 


l7r(0)x(0) — ( y^a(r — l)l7r(r)d(r — 1) 


\r=l 


(7r(0),x(0)) - '^a{r- 1) (7r(r),d(r - 1)) j 1, 


( 11 ) 


r=l 


where (•, •) is used to denote the inner product of two vectors of proper dimension. Let y{t} denote 
the limiting vector of x(t + s + l)ass + l^ oo. Since all entries in the limiting vector are identical 
we denote the identical value by y{t). Thus, y{t) = [y{F),... ,y{t)]'. 

From (11) we have 


y(t) = (7r(0), x(0)) - a(r - 1) (7r(r), d(r - 1)) . 


( 12 ) 


r=l 


If, instead, all agents cease computing di{t) after iteration t + 1, then the identical value, denoted 
by y(^+ 1); equals 


t+i 


y{t + 1) = (7r(0), x(0)) - a(r - 1) (7r(r), d(r - 1)) 


r=l 

i 


(7r(0), x(0)) - y^ a(r - 1) (7r(r), d(r - 1)) - a{i) (7r(t + 1), d(t)) 


r=l 


= y(^ - o;{t) (7r(t + 1), d(t)), 


(13) 


where each entry di{F) in d(t) denotes the subgradient oi gi{-) computed by agent i at Xi(t). With 
a little abuse of notation, henceforth we use t to replace i. The actual reference of t should be clear 
from the context. 


In our convergence analysis, we will use the well-know “almost supermartingale” convergence 
theorem in [22], which can also be found as Lemma 11, in Chapter 2.2 [20]. We present a simpler 
deterministic version of the theorem in the next lemma. 

Lemma 4. [22] Let {atj^g, and {ct}^g be non-negative sequences. Suppose that 

ot+i < at — bt -\- ct for all t > 0, 

and ^ < oo and the sequence {atlJTg converges to a non-negative value. 
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The basic iterative relation of the consensus value y{t) is stated in our Lemma 5. 

Lemma 5. Let {y{t)}^Q be the sequence of limiting consensus value defined by (12), and {xi(t)}“Q 
he the sequence for i € V —T” generated by (7). Let {5i(t)}^o ® sequence of subgradients of gi at 
y{t) for all i gV — IF. Then the following basic relations hold. For any x G M and any t > 0, 

n—4> 

\y{t + 1) - xp < \y{t) - xf + 4:La{t) ^ TTj{t + 1) \y{t) - Xj(t)| 

j = ^ 

n—(j) 

- 2a(i) X] {y{t)) - 9j{x)) + a^{t){n - 4>)Lfi 

i=i 

The proof of Lemma 5 can be found in [18]. We present the proof in Appendix E. For each t 
and each i G V — F, the distance between the consensus value y{t) and the local estimate Xi{t) is 
bounded from above. 

Lemma 6. Let U = maxjgy_j-Xj(0), and u = minjgy_j-Xj(0). For every i G V — F, a uniform 
bound on \y(t) — Xi(t)\ for t > 1 is given by: 

t-i 

\y{t) — Xi{t)\ < {n — 4>) max{|tt|, + {n — 4>) L a{r — + 2Q:(t — 1)L. (14) 

When t = 1, ~ = 0 by convention. 

Note that the upper bound on \y{t) — Xi{t)\ in (14) depends on t. In fact, this upper bound will 
diminish over time, as formally stated below. 

Lemma 7. For each i gV — F, the limit of \y(t) — Xi{t)\ exists and 

lim \y(t) - Xi{t)\ = 0. 

r—)-oo 

Our main convergence result is stated below. 

Theorem 4 (Convergence). For each i G V — F, {xi(t)}“Q converges to the same optimum in 
X, i.e., 

lim |xj(t) — x*| =0, 

t—¥oo 

where x* G X. 

We provide a sketch of the convergence proof below. Formal proof can be found in Appendix E. 
Recall that each gfi-) is defined as 

gfix) = Aiihi{x) + A 2 ih 2 {x) + ... + Akihk{x), 

for i G V, where Aji > 0 and ^ji — 1- Let T* = argmin gfix) and Y) = argmin Ajihj{x) for 

j = 1,..., /c. Since for each j G {1,..., /c} such that Aji = 0, argmin Ajihj{x) = 0 is a constant 
function over the whole real line, it holds that Y) = M. Since positive constant scaling does not 
affect the optimal set of a function, for each j G {1,..., k} such that Aji > 0, it holds that YJ = Xj. 
In addition, because hi{x),..., hk{x) are solution redundant functions, i.e., Hj^iXj 7 ^ 0, functions 
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Aij/ii(x),..., are also solution redundant. It can be shown (formally proved in Appendix 

A) that 

y" = nj:A,i>oAj D for all i G V. 

Let x' G X. Dehne g* as the optimal value of function gj{-) for each j G V. We have 

n—(f) 

\y{t + 1) - x'l^ < \y{t) - x'\^ + 4La(t) ^ 7rj(f + 1) \y{t) - Xj{t)\ 

n—(j) 

- 2a(0 X] (fo' “ 9 j{x')) + a^{t){n - (l))L^ 

n—<j) 

- \y{i) - x'\^ + 4:La{t) ^ 7rj(t + 1) \y{t) - Xj{t)\ 

i=i 

n—(f) 

- 2a(i) X] “ Sj) + a^(t)(ra - ((>)L^. (15) 

i=i 

Equality (a) holds because of x' G A C A-? for each j G V, then gj{x') = g^. 

For each t > 0, define 

at = \y{t) - x'p, 

n—(p 

bt = 2a{t) ^ Trj{t + 1 ) {gj{y{t)) - g*) , 

3 = ^ 
n—<j) 

Ct = 4:La{t) ^ 'Kj{t + l)|y(t) - Xj{t)\ + a^{t){n - (t))L^. 
i=i 

It is easy to see that Hi > 0 and ct > 0 for each t. Since g^ is the optimal value of function gj{-), it 
holds that 6* > 0 for each t. Thus, and {ct}“g are three non-negative sequences. 

By (15), it holds that 

cit+i < ctt — bt + Ct for each f > 0. 

To apply Lemma 4, we need to show that < oo. In fact, the following lemma holds. 

Lemma 8. 

OO n — (f> 

+ l)|y(i) - Xj{t)\ < oo. 

t=0 j=l 

The proof of Lemma 8 is presented in Appendix E. In addition, since X)£o'^^(i) < oo, it holds 
that 

OO 

(n — 4')l3 ^ a^{t) < oo. 
t=o 
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Thus, we get 

oo 00 / n—cf) \ 

^ct = ^ ALa{t)J2^ji* + +a^it)in - (f>)L^ | 

i=0 1=0 \ j=i j 

00 / n—cj) \ 00 

= 4L ^ a(t) ^ TTj (t + 1) |y(t) - (t) | + (n - 4>)L^ ^ (t) 

t=0 \ jr = l / 1=0 

< 00 . 

Therefore, applying Lemma 4 to the sequences {oij^O’{^lii^o {ci}“o’ have that for any 
x' € X, at = \y{t) — x'\ converges, and 

00 CXD n — (j) 

^ ^ a{t) ^ 'Kjit + 1) {gjivit)) -g*) < 00 . 

t=o 1=0 j=l 

Since \y{t) — x'\ converges for any fixed x' € X, by definition of sequence convergence and the 
dynamic of y(t) in (12), it is easy to see that y(t) also converges. Let limt_,.oo y(t) = y- Next we 
show that y & X. 


Let Xt+i C V — T” be the set of indices such that for each j € It+i, TTjit + 1) > /3'^. As G{V,£) 
satisfies Condition 1 , \It+i\ > max{A:', / + 1}. Since gj{y{t)) — g* > 0 for all j, then 


where 


n—(j) 


^ 7 rj(t + l) {gj{y{t)) 
i=i 


9j) ^ Y1 - 9j) 

i6Xt+i 

i9jiyit)) - 9j) 

ieit+i 

k 

= fr Y, ('*.(9(0)-'!.*) 

jeXt+i 1=1 

*=i \ieXt+i / 

> fc/3"C2 (/i(y(t)) - h *), 


C 2 


min > Ait, 

ICV: |I|>max{A:',/+l} ^ 


and the last inequality follows from the fact that hi{y{t)) — h* > 0. In addition, as sp{A) = k', 
then > 0 every X C V : |X| > max{A:', / + 1}. Since A is finite, C 2 is well-defined and 

6*2 > 0. If ^ X, it can be shown that kj3^C2 {h{y{t)) — h*) = 00 . This contradicts the fact that 
Thus, y ^ X. 


Therefore, we conclude that limit of \xi{t) — y\ exists and 

lim \xi{t) -y\=0, 

t—)-oo 


proving Theorem 4. 
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5 Summary and Conclusion 

In this report, we introduce the condition-based approach to Byzantine multi-agent optimization. 
We have shown that when there is enough redundancy in the local cost functions, or in the local 
optima. Problem 1 can be solved iteratively. 

Two slightly different variants are considered: condition-based Byzantine multi-agent optimiza¬ 
tion with side information and condition-based Byzantine multi-agent optimization without side 
information. For the former, when side information is available at each agent, a decoding-based 
algorithm is proposed, assuming each input function is differentiable. This algorithm combines 
the gradient method with the decoding procedure introduced in [4] by choosing proper “genera¬ 
tor matrices” as job assignment matrices. With such a decoding subroutine, our algorithm essen¬ 
tially performs the gradient method, where gradient computation is processed distributedly over 
the multi-agent system. When side information is not available at each agent, we propose a sim¬ 
ple consensus-based algorithm in which each agent carries minimal state across iterations. This 
consensus-based algorithm solves Problem 1 under the additional assumption over input functions 
that all input functions share at least one common optimum. Although the consensus-based al¬ 
gorithm can only solve Problem 1 for a restricted class of input functions, nevertheless, as each 
non-faulty agent does not need to store the job matrix A throughout execution and does not need 
to perform the decoding procedure at each iteration, the requirements on memory and computa¬ 
tion are less stringent comparing to the decoding-based algorithm. In addition, in contrast to the 
decoding-based algorithm, the consensus-based algorithm also works for nonsmooth input functions. 
Thus, the consensus-based algorithm may be more practical in some applications. 
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Appendices 


A Connection between X and Xj’s for j = 1,..., A; 


Recall that Xj is the set of optimal solution(s) of input function hj{x), for j = 1,... ,k; and that X 
is the optimal set of function ^ hj{x) in Problem 1. Propositions 1 and 2 are used in proving 
the correctness of other results in this report. 


Proposition 1. [23] The optimal set X of Problem 1 is contained in the convex hull of the union 
of all Xj’s, i.e., 


X C Cov , 


(16) 


where Cov (Z) is the convex hull of set Z. 


The above proposition holds for any collection of k admissible input functions. A stronger 
connection holds for solution-redundant input functions, as stated below. 


Proposition 2. When all the input functions share at least one common optimum, i.e., pfi^^^Xj 
@, then 

X = (17) 


Proof. We first show that Pj^^Xj C X. Let h* be the optimal value of function hj{x) for j = 
l,...,k, and let h* be the optimal value of function ^X)j=ihj(x). Since Pj^^Xj 0, let xq € 
Pj^^Xj. Then for all x € M 


So we know 


T hj{xo) = <h* 


i=i 


j=i 


i=i 


k 


k 


'^hj{xo) = '^h* = h*. 
i=i i=i 


Thus xq X and Pj^^Xj C X. 

Next we show X C Pj^^Xj. We prove this by contradiction. Suppose on the contrary that 
X % Pj^^Xj, then there exists x' € X such that x' ^ Pj^^Xj. The latter implies that x' 0 Xj^ for 
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some jo G {1,..., k}. Then 


'^hj{x')=i hjix')\ +hj^{x') 

i=l \i<j<k,j^jo ) 

hj\+hjoix') 

.l<j<k,j^jo ) 


> 


> 


E '>11+'*. 


'JO 


= E 

j=i 

So x' ^ X, which leads to a contradiction. Thus X C 
Therefore, X = 


□ 


B Condition-based Byzantine multi-agent optimization without side 
information 

Proof of Lemma 1 

Proof. Let sp{A) = k', by definition of sp (A), the sum vector of any collection of k' columns of A 
is component-wise positive. Suppose there exists a row, say io, that contains at least k' zero entries. 
Let ji,j 2 , ■ ■ ■ ,jk' be any k' columns in A, wherein the ZQ-coordinate of each column is zero. Thus 
the ZQ-^th coordinate of 'Y-=i ^jr is zero, contradicting the hypothesis that sp{A) = k'. In addition, 
if every row contains more than k' — 1 zeros, using the same argument it can be shown that k' is 
not the smallest integer. 

Conversely, we need to show that if there are at most k' — 1 zero entries in each row of A and 
there exists one row contains exactly k' — 1 zero entries, then sp{A) = k'. Let ji,..., be any 
collection of k' columns of A. For each coordinate, at least one of the chosen k' columns contains 
positive entry in that coordinate. So we have X]r=i > 0 componentwise. By definition of sp{A), 
we know sp{A) < k'. In addition, let zq be a row in which there are exactly k' — 1 zeros, then there 
exists a collection of fc' — 1 columns whose ZQ-th coordinate are all zeros, and that the sum of the 
k' — 1 columns also has the ZQ-th coordinate being 0. Thus sp{A) = k'. 

□ 


Proof of Theorem 2 

Proof. We first show that if Problem 1 is solvable, then a source component must exist in every 
reduced graph of G(V, £), containing at least / + 1 nodes. Then we show that when k' > f + 1, the 
source component must contain at least k' nodes. These two claims together show that if Problem 
1 is solvable, then a source component must exist in every reduced graph of G{V,£), containing at 
least max{/ -t- 1, k'} nodes, proving the theorem. 
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Definition 4 (Condition 2). Given a graph G{y,£), for any node partition L,R,C,F ofG{V,£) 
such that L,R are nonempty and |F| < f, one of the following must hold: (1) there exists a node 
i £ L that has at least / + 1 incoming neighbors in RuC, i.e., \N~ n (i?U C)! > / + 1; or (2) there 
exists a node j € R that has at least f + 1 incoming neighbors inLuC, i.e., \N~ r\{L[JC)\ > / + !. 

Now we show that Condition 2 is a necessary condition for Problem 1. Suppose graph G(V, £) does 
not satisfy Condition 2 and there exists a correct algorithm A solving Problem 1 for solution redun¬ 
dant input functions. Recall that Xi = argmin hi(x) for each i = 1,... ,k, and X = argmin h{x). 
Consider the input functions hi{x) for all i = 1,..., fe such that each hi{x) is admissible with op¬ 
timal set Xi = [0,1]. Since = [0,1] / 0, then hi{x),... , hk{x) is a collection of k solution 

redundant input functions. In addition, by Proposition 2, we know that X = [0,1]. Since G{V,£) 
does not satisfy Condition 2, then there exists a node partition L, R, C, F, where L, R are nonempty 
and \F\ < /, such that \N~ n (R U (7)1 < / for each i £ L and \N~ n (L U (7)| < / for each j £ R. 
Consider the execution, denoted by ei, wherein all nodes in F are faulty and all the remaining 
nodes are non-faulty. The initial states of all non-faulty nodes are assigned as follows: Xi(0) = 0 for 
each i £ L, Xi(0) = 1 for each i £ R, and Xj(0) as an arbitrary value within [0,1] for each i £ C. 
In iteration 1, each faulty node p sends Fp{0,gp{-)) to nodes in L, sends Fp{l,gp{-)) to nodes in R 
and sends Fp{a,gp{-)) to nodes in C, where a is an arbitrary value within [0,1]. Let i £ L he an 
arbitrary node in L. We will show that there exists an execution e* that can not be distinguished 
from ei by node i. Thus node i should behave in the same way in ei and Cj. Let Xi{t) and Xi{t) be 
the local estimate of agent i in ei and in e*, respectively. 

Execution Cii The input functions are hi(x ),... , h}^{x). All nodes in N~ n (i? U (7) are faulty, and 
the other nodes are non-faulty with initial state 0, i.e., Xj(0) = 0 for all j ^ N~ n (R U (7).^ Since 
Xj(0) = 0 G [0,1] = X, for a\\ i £V — F, then Xj(l) = 0 for all z G V — R, where F = Nf n (R U C) 
is the set of faulty nodes in execution e^. 

Since agent i cannot distinguish execution e* from ei, thus Xj(I) = 0 . As agent i is an arbitrary 
agent in L, in execution ei it holds that Xi{l) = 0 for all z G R. Similarly, we can show that in 
execution ei, Xi{l) = 1 for all i £ R. Repeatedly applying the above argument, we can conclude 
that for any iteration t in execution ei, it follows that Xi{t) = 0 for all z G L and Xj{t) = 1 for all 
j £ R, contradicting the asymptotic consensus requirement of a correct algorithm for Problem I. 
Thus, Condition 2 is a necessary condition for Problem I. In addition, it was shown in [28] that 
graph G{V,£) satisfies Condition 2 if and only if a source component exists containing at least 
f + 1 nodes in every reduced graph of (7(V, £). 

Therefore, if Problem 1 is solvable, a source component containing at least f+1 nodes must 
exist in every reduced graph of G{V,£). Next we show, by contradiction, that if Problem 1 is solv¬ 
able and A:' > / -|- 1, a source component must contain at least k' nodes. 


Suppose there exists a reduced graph R of G{V,£) whose source component contains at most 
k' — 1 agents, and there exists a correct algorithm A that can solve Problem 1 . Denote the source 
component of the reduced graph R by S-g. Let L,R,C,F be the node partition of G{V,£) where 
L = G = 0^R = V—L—F and F = F. Note that it is possible that R = 0. Let hi{x ),..., hk{x) 
and hi{x),... ,hk{x) be two collections of k admissible input functions such that (I) hj(-) = /zj(-) 
for z = 1,..., A: — 1, argmin hi{x) = [0,1] for all z = 1,..., A:, and argmin hk{x) = {!}. Let A be 
an assignment matrix such that sp{A) = k' and = 0 for each i £ L. Such a matrix exists, 
since jR| < A:' — 1. Informally speaking, with this assignment matrix, each agent z in R does not 

Execution a is possible since |A^~ H (i? U (7)1 < /. 


4 
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have any information about the k~th input function.Consider the execution Ei, wherein the input 
functions are hi{x),..., hk{x), each agent p in is faulty and all other agents are non-faulty. The 
initial states of non-faulty agents in execution Ei are assigned as follows: Xi(0) = 0 for all i a L, 
and Xj(0) = 1 for all i € R (if R / 0)-recalling that C = 0. Each faulty agent p & F sends 
Fp{o,E-=i Aiphi{x)^ to nodes in L, and sends Fp ^ 1 , Yli=i to nodes in i? (if i? / 0 ). 

Let i E L be an arbitrary agent in L. Now consider the following execution, denoted by Ei, 
wherein the local estimate of each non-faulty node j is denoted as Xj. The input functions are 
hi{x),... ,hk{x). All nodes in Nf f] {RU C) are faulty, and the other nodes are non-faulty with 
initial state 0, i.e., Xj(0) = 0 for all j ^ N~ r\{RLlC). Since Xi(0) = 0 E [0,1] = X, for alH E V — 
then Xi{t) = 0 for all i E V — 7^ and for all t, where E = N~ n (i? U C) is the set of faulty nodes in 
execution E^. 

Node i cannot distinguish Ei from Ei, thus Xj(l) = 0 in Ei. Since i is an arbitrary node in L, thus 
Xi{l) = 0 for alH E L in Ei. Repeatedly applying the above argument, we have llmt^ao Xi{t) = 0 
for each z E L in Ei. However, we know that in Ei, the optimal set is X = {!}. Because in execution 
El, the correct output must be 1, A is not a correct algorithm. Thus, we know if Problem 1 is 
solvable and A:' > / -|- 1, a source component must contain at least k' nodes. 


Therefore, we conclude that if Problem 1 is solvable, a source component containing at least 
max{/ -|- 1 , k'} nodes must exist in every reduced graph of GiV, £), proving Theorem 2 . 


□ 


Proof of Corollary 1 

Proof. Let sp{A) = k'. It was shown in [27] that if Condition 2 is true, then n > 3/ -|- 1. It is 
enough to consider the case when k' > f + 1. 

Suppose 3/ -|-1 < n < A:' -|- 2/. Consider the node partition L, R, F such that |i?| = |E| = /, and 
L = V — R — F. Since 3/ -|- 1 < n < A:' -|- 2/, it holds that f + l < \L\ < k' — 1. Suppose all nodes in 
F are faulty. Consider the subgraph R constructed from G{V,£) by (1) removing all faulty nodes, 
i.e., all nodes in F, and (2) for each i ^ L, removing all incoming links from R. The subgraph R is 
a valid reduced graph since \R\ = f. By Theorem 2, a source component exists in R. Let S be the 
source component of R. By Theorem 2, it holds that IS"! > k'. Since each node j € R cannot reach 
nodes in L, by definition, each node j £ R Is not contained in a source component. Thus, SEE. 
Consequently, it holds that |5| < \L\ < k' — 1, contradicting the fact that |5| > k'. Thus, when 
A' > / -|- 1 , it holds that n> k' + 2f. 


Therefore, we conclude that n > max{3/ -|- 1, A' -|- 2/}. 


□ 


C Matrix Representation of Algorithm 2 

If G{V,£) satisfies Condition 1, it was shown in Proposition 3 that the updates of x E in 

each iteration can be written compactly in a matrix form.This observation is made in [26], and we 
restate this result below for completeness. 


21 


Proposition 3. [26] We can express the iterative update of the state of a non-faulty node i {1 < 
i < n — f)) performed in (4) using the matrix form in (18) below, where Mj(t) satisfies the following 
four conditions. 


Xi{t + 1) = Mj(i) x(t) - a{t)di{t). (18) 

In addition to t, the row vector Mj(t) may depend on the state vector x{t — 1) as well as the 
behavior of the faulty nodes in T. For simplicity, the notation Mj(t) does not explicitly represent 
this dependence. 

1. Mj(t) is a stochastic row vector of size {n — (f). Thus, Mjj(t) > 0, for 1 < j < n — 4>, and 

E 1 

2. Mjj(t) equals a* defined in Algorithm 1. Recall that a* > a. 

3. Mjj(t) is non-zero only if (j, i) g£ or j = i. 

4 . At least \N~ n (V — J^)| — / + 1 elements in Mj(i) are lower bounded by some constant /3 > 0 
(f) is independent of i). Note that N~ n {V — T) is the set of non-faulty incoming neighbors of 
node i. 

D Convergence of the Transition Matrices $(t,r) 

Proof of Lemma 2 

Proof. Recall that Rjr is the collection of all reduced graphs of the given graph G{V, £). Let TL E Rjr 
be an arbitrary reduced graph with adjacency matrix H. Let k' = sp{A). From Theorem 2 we know 
that there are at least max{fe', / + !} nodes in the unique source component in R. Denote the source 
component in TL by and let ji, ■ ■ ■ ,jp^ where 

P = 1*5^1 > max{A;',/ + 1}, 

be the p nodes in S-^. By dehnition, each ji has a directed path to all the other non-faulty nodes 
in TL. Since the length of a path from ji to any other node in TL is at most n — (f—l, then the j^-th 
column of will be non-zero for i = 1,2,... ,p. Since p > max{A:', / -|- 1}, there are at least 

max{A:^, / -|- 1} such columns in 

Recall that for any t > 1, there exists a graph TL{t) E Rjr such that /3H(t) < M(t), thus we 
have 

$(r -|- — l,r) = M(r n — l)M(r n — 2)... M(r) 

r+iA—l 

>/3" n H(t). 

t=r 

The above product of adjacency matrices consists of = r(n — 4>) matrices (corresponding to 
reduced graphs) in Rj^. Thus, at least one of the r distinct adjacency matrices in Rj^, say TL', 
will appear in the above product at least n — cf times. Let 5^/ and B be the source component 
size and the adjacency matrix, respectively, of TL'. In addition, let p' = \TL'\. Due to the existence 
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of self-loops in the update dynamic, each H(t) has positive diagonal. In addition, (B)”' con¬ 
tains p' nonzero columns, where p' > max{/c', / -|- 1}. Thus each of the ji, j 2 ,... ,jpi columns i 

n 


r+z/—1 
t=r 


H(t) is lowered by 1 G M” ^ component-wise, i.e., 


r+z/—1 
t=r 


H 


> 1 for i = 1,... ,p'. 


where I 


I r+z^—1 
t=r 


H 


■ji 


is the jj-th column of XYttr ^ H(t). Therefore, 


■H 


{^{r + v 

for i = 1, 2,... ,p', where p' > max{A:', / -|- l}-noting that ($(r + 1 ^ — 1, column 

oi ^{r + V — l,r). 

□ 


Proof of Lemma 3 

Proof. From Lemma 2, we know that there are at least max{sp(A), f + 1} columns of — 1, r) 

that are lower bounded by /3^ for a given r. Let X,. be the collection of column indices such that 
for each i Gir, 

($(r + 1 ^ - l,r)).- > P^'l. 

Let t > r + u. From (8), we know that 


lim $(t, r) = l7r(r)', 

>-oo 

for all r. By the definition of $(t, r), we know for t > r + v — 1, we have 

$(t, r) = ^ {t,r + v) ^ {r + V — l,r). 

Thus, 

l7r(r)'= lim $(t, r) 
t>r, t—)-cxD 

= lim ^ {t,r + u) ^ {r + V — l,r) 

t>r, t—>-oo 

= (l7r(r -|- v'Y') ^ [r + o — l,r). 

Thus, for each i G Ir, 

n—4> 

vri(r) = ^ TTj (r v) ^ji {r + v - l,r) 
i=i 

( n-cj) 



□ 


23 


E Convergence Analysis of Algorithm 1 


Proof of Lemma 5 


The proof of Lemma 5 can be found in [18]. We present the proof below for completeness. 

Proof. For any a: G M and any t > 0, 

\y{t + 1 ) - xp = \y{t) - a{t) {^{t + 1), d(t)) - xp 

= \y{t) - xf - 2a{t) {7r{t + 1 ), d(t)) {y{t) - x) + (7r(t + 1 ), d(t)) ^ 

< \y{t) - xp - 2Q;(t) (7r(t + l),d(t)) {y{t) - x) + a^{t) ||7r(t + l)f ||d(t)f 

< \y(t) - x\^ - 2a(t) (7r(t + l),d(t)) {y{t) - x) + a^{t) ||d(t)f 

n—(j) n—cf) 

= \y(t) - - 2a(t) + l)rfj(i) {y{t) - x) + a^{t) ^ 

1=1 1=1 

Inequality (a) follows from Canchy-Schwarz inequality. Inequality (6) follows because 


n—(p n—cf) 

||7r(t + l)f = ^7r|(t + I) < '^TTj{t + l) = 1. 

1=1 1=1 

We now consider the term dj{t) {y{t) — x) for any j a V — J-, for which we have 

djiA {y{t) -x) = dj{t) {y{t) - Xj{t) + Xj{t) - x) 

= dj{t) {y{t) - Xj{t)) + dj{t) {xj{t) - x) 

> -\dj{t)\ \y{t) - Xj{t) \ + dj{t) {xj{t) - x) 

> -\dj{t)\ \y{t) - Xj{t)\ + gj {xj{t)) - gj{x), (19) 

since dj{t) is a gradient of gj{-) at Xj{t). Furthermore, by using a gradient 5j{t) of gj{-) at y{t), we 
also have for any j G V — and x G M, 

9j {xj{t)) - gj{x) = gj {xj{t)) - gj {y{t)) + gj {y{t)) - gj{x) 

> 6j{t) {xj{t) -y{t)) +gj {y{t)) - gj{x) 

> -|-5j(f)| \xj{t) - y{t)\+gj {y{t)) - gj{x). (20) 

Combining (19) and (20) together, it follows that for any j — F and any x G M, we obtain 


dj{t) {y{t) -x)> -i\dj{t)\ + |(ij(t)|)|y(t) -Xj{t)\ + gj{y{t)) - gj{x). 
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Therefore, 

n — (f) n — <j) 

\y{t + 1) - x|^ < \y{t) - - 2a{t) ^ 7rj{t + l)dj{t) {y{t) - x) + a^{t) ^ 

i=i i=i 

< \y{t) - x\^ 

n—(f) 

+ 2a(t) ^ + 1) ((|dj(f)| + \6j{t)\) \y{t) - Xj{t)\ - (gj {y{t)) - gj{x))) 

n—(p 

j = ^ 

n—4> 

< \y{t) - x|2 + 2a{t) TTjit + 1) i{\dj{t)\ + |(5j(t)|) \y{t) - Xj{t)\) 

n—(f) n—(j) 

— 2a{t) Y ivit)) - gj{x)) + a^{t) Y 

i=i i=i 

n—(j) 

< \y(t) - xp + 4La(f) Y 

n—4> 

- 2a(i) Y + a^(t)(n - (t>)L^. 

i=i 

The last inequality holds from the fact that gj{-) is L-Lipschitz continuous for each j € V. 

□ 


Proof of Lemma 6 

Proof. Recall (7). For t > 0, 

t 

x(t) = — 1,0)x(O) — Yj “ 1)^(^ “ 1) r)d(r — 1) 

r=l 


then each Xi{t) can be written as 

n—(f) 


Xi{t) = Y “ l,0)xj(0) - X] X] “ '^^r)dj{r - 1) ; 


i=i 


r=l 


i=i 


and (12) implies that y{t) = 7rj(0)xj(0) — 'Yr=i Yll=t “ !)• Thus 


\y{t)-Xi{t)\ 



n—(j) 


* ! 

n-(j) \ 

< 

Y l,0))xj(0) 

+ 

Y «(!■ 

- 1) X] 1) 


1=1 


r=i y 

.=1 ) 


( 21 ) 
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We bound the two terms in (21) separately. For the first term in (21), we have 


n—(j) 


^ (vrj(O) - - 1,0)) Xj(0) 

1=1 


^ - 1>0)I kl(0)l 

1=1 

(a) 


< max{|u|, |t7|} 

1=1 

= {n — 4>) max{|u|, , 

where inequality (a) follows from Theorem 3. 

In addition, the second term in (21) can be bounded as follows. 
t / n—(j> 

^ a(r - 1 ) ^ ($il(t - l,r) - TTj{r))dj{r - 1 ) 


( 22 ) 


r=l 


(a) 


1=1 


t-l 


< a(r-l)^|$*j(t-l,r)- 7 rj(r)| |dj(r-l)| + a(t - 1 ) 


r=l 

£-1 


3 =^ 

n—(t> 


di{t-l) -Y '^jit)dj{t - 1) 


1=1 


n—cf) 


<Y \dj{r-l)\ +a{t-l)Y'^jit)\di{t-l)-dj{t-l)\ 


r=l 

t-l 


1=1 


1=1 


< ^ I a(r - 1 ) ^ - l,r) - 7rj{r)\ j L + 2a(t - l)L 


r=l 


1=1 




t-l 


<{n-^)L E a{r — 1 ) 7 !^ ^ + 2a{t — 1 )L 


(23) 


7 "=! 


where inequality (a) follows from the fact that $(t — l,t) = I. Note that when t = 1, it holds that 


t-l 


Y I I = 0 - 

1=1 


r=l 


From (22) and (23), the LHS of (21) can be upper bounded by 

t-l 

\y{t) — Xi{t)\ < {n — cj)) max{|M|, + {n — 4>) L Y^ “ 1)7^~^ + 2a{t — 1)L. 

The proof is complete. 


r=l 


□ 


Proof of Lemma 7 

Proof. Recall (14), 

t-l 

\y{t) — Xi{t)\ < {n — 4>) max{|u|, |L'|} 7 ^^^ + {n — 4>) L Y^ 11 ( 1 ' “ 1)7^~^ + 2a{t — 1)L. 

r=l 
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Since 0 < 7 < 1 and limt_).oo o-{t) = 0, it is easy to see that the first term and the third term on 
the RHS of (14) both converge. In particular, 

lim (n — (j)) max{|u|, = 0 , 


t—)-oo 


and 


lim 2a{t — 1)L = 0. 




Define 


For any t > 1, we have 


^(t) = ^a(r-l)7r^l. 


r=l 


£(t) = ^ a(r — 1 ) 7 !^ ^ 


r=l 

r§i 


^^a(r — 1 ) 71 " a(r — 1 ) 7 ^ 


t +1 —r-| 


r=l 


111 


^=i|i+i 


t+l —r 


/ \ V , , t+1- 

< >^a(r — 1)7 •' + 2, a(r — 1)7 


r=l 


111 


’'=r|i+i 


< ^a(0)7*’^i' " +a([^]) 7 


t+1 —r 


r=l 


^=r|i+i 


< a(0)- 


t 

'y 2v 


T + 




1 — 71 / 1 —^ 1 . 


Thus, we get 


t 

'J 2^ 


limsupt(t) < 0 ( 0 ) lim —- - + lim =0 + 0 = 0 . 

t—>-oo t-^-oo t->-oo _ ^- 


1 — 71- 

Taking limit sup on both sides of (14), we have 

limsup \y{t) — Xi{t)\ < lim (n — cj)) max{|n|, |tl|}7^^^ + limsup (n — 4>) Li{t — 1) + lim 2a{t — 1)L 

t^oo t^oo 

<0 + 0 + 0 = 0 . 

On the other hand, since \y{t) — Xi{t)\ >0 for each t, it holds that 


Thus, 


lim inf \y{t) — Xi{t)\ > 0. 

t^OO 


limsup \y{t) — Xi{t)\ < 0 < lim inf \y{t) — Xi{t)\. 


By definition, we know liminft^oo |y(^) — Xi{t)\ < limsupi^oo |y(t) — Xi{t)\. 


Therefore, the limit of \y{t) — Xi{t)\ exists, and limi_).oo |y(t) — Xi{t)\ = 0. 


□ 
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Proof of Lemma 8 


Proof. Since + 1 ) = 1 ; by Lemma 6, we have for alH G V — 

n—<j) 

'^Tri{t + l)\y{t) -Xj{t)\ 


j=i 


t-i 


< 


TTi{t + 1 ) I (n — (/)) max{|n|, + [n — 4 >) L a{r — 1)7!^ ^ + 2 Q!(f — 1 )-L 


i=i 


r=l 


t-1 


< {n — 4 >) max{|tt|, + (n — </>) L a{r — 1)7!^ ^ + 2a(f — 1)-L. 


r=l 


Using the inequality that for each r and t 


a{t)a{r — 1) < - {oP'it) + a^{r — 1)) 


we obtain 


00 n—d 


^a(t) ^7ri(t + l)|y(f) -Xj{t)\ 
t=2 j=l 


00 


t-1 


— max{|M|, + {n — (f) L a{t)a{r — 1)7!^ ^ + 2 a{t)a{t — 1 )L 

t=2 
00 

< 


r=l 
00 t—1 


LXJ / j LXJ I —± 

^ a{t) (n - (f) max{|n|,^ ^ a^(t)7r 


t=2 


t=2 r=l 


+ 


(n — (f)L 


00 t—1 


^ a^(r - 1 ) 7 ^ + Y (“^(^) + - 1 )) ^• 


( 24 ) 


t=2 r=l 


t=2 


To show '^(^) J 2 j=i + l)|y(^) “ < 00, we show each of the four terms in the RHS 

of ( 24 ) is finite. 

For the first term on the RHS of ( 24 ), we have 

CXD CXD 

Y^ <^{t) (^ “ </*) max{|n|, |U|}7l^7l = (n — cf) max{|n|, |C/|} Y^ 0(^)7^^^ 


t=2 


t=2 


(a) 

< in — 


n — 4>) max{|n|, |[/|}a(l) 


t=2 

00 


<{n- 4 )) max{|n|,|U|}a(l) ^7^ 


t=2 


< (n — (p) max{|n|, |U|} 


a(l) 

1 

1 — 71 . 


< 00. 


( 25 ) 
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Inequality (a) holds due to the fact that a{t) < a{l) for all t>l. 
For the second term in the RHS of (24), we have 

(n — 0)L 2 /\ 2 /\ 


f=2 r=l 


t=2 r=l 

(n — 0)L ' 

(*)Ev 

t=2 r=l 

(n-0)L ^ 2 . ’ 

£ W^E“ (*)E^= 


t=2 

oo 


r=l 


(n - (f))L 

<- 1 ; 2 ^« (^) 


OO 


2(1-70 t =2 

< oo due to the fact that Q:^(t) < oo 


E H 

ryi, = 

r=l 

oo 




< 


i — i 

1 — 71 , \ 


(26) 


t=o 


For the forth term in the RHS of (24), we get 

OO 0000 

{a^{t) + a^(t — 1 )) L = La^(t) + La^(t — 1 ) < 


00. 


(27) 


t=2 


t=2 


t=2 


For the third term in the RHS of (24), for any fixed N, we get 

/ j\ j ^ t—\ / jv j N t—\ 

E E E E -1)7 


t=2 r=l 


{n — (l))L 


t=2 r=l 
N-l 


N-r 




1 - 


r=l 

N-l 


t=l 


£ ^ - 1 ). 

2 ( 1 -7^7) ^ 


Thus, we get 


E E - 1 ) 71 *?! < -0^ E -1) < 

2(1 - 7 O r=l 


OO. 


(28) 


f=2 r=l 

In addition, for t = 0, it holds that |y(0) — Xj(0)| < U — u. For t = 1, by Lemma 6, we have 
n—<j) 

Y < (^ - 0 max{|u|, |t/|}7^"^ + 2a(0)L. 

i=i 

Thus, 

n—4> n—4> 

«(0) Y ^*(l)ly(0) “ ®j(0)l + “(1) X] - “(0 {U -u) + 2a(0)L 

i=i i=i 

+ Q;(l)(n — (p) max{|ti|, |t/|}70T 


(29) 


< 00. 
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By (25), (26), (27), (28) and (29), we conclude that 

oo n—(j) 

-Ki{t + l)|y(t) - Xj{t)\ < oo, 

t=o j=l 

proving the lemma. 

□ 


Proof of Theorem 4 

Proof. Recall that each gi{-) is defined as 

gi{x) = Aiihi{x) + A 2 ih 2 {x) + ... + Akihk{x), 

for i € V, where Aji > 0 and ^ji = 1- Let T* = argmin gi{x) and Yj = argmin Ajihj{x) for 

j = 1,... ,k. Since for each j G {1,..., A;} such that Aji = 0, argmin Ajihj{x) = 0 is a constant 
function over the whole real line, it holds that Yj = R. Since positive constant scaling does not 
affect the optimal set of a function, for each j G {1,... , A:} such that Aji > 0, it holds that YJ = Xj. 
In addition, because hi{x),, hk{x) are solution redundant functions, i.e., ^ 0, functions 

Aiihi{x),... ,Akihk{x) are also solution redundant. By Proposition 2 we have 

T* = 5 for all i G V. 

Let x' G X. Dehne g* as the optimal value of function gj{-) for each j G V. We have 

n—cj) 

\y{t + 1) - x'p < \y(t) - x'p + 4La(t) ^ TTj{t + l)\y(t) - Xj(t)\ 

j = ^ 

n—(j) 

- 2a(A) ^ -Kj{t + 1) [gj {y{t)) - gj{x')) + 

i=i 

n—(p 

- \y{t) - x'\^ + 4La(A) ^ TTjit + 1) \y{t) - Xj{t)\ 

j = ^ 

n—cj) 

- 2a(A) ^ 7rj(t + 1) {gj {y{t)) - g*) + a^{t){n - (l))L?. (30) 

i=i 

Equality (a) holds because of x' G X C Y^ for each j G V, then gj{x') = gj. 

For each i > 0, define 

= \yit) - x'f, 

n—(l> 

bt = 2a{t) ^ TTj{t + 1) {gj{y{t)) - gj), 

3 = ^ 
n—cf) 

Ct = 4:La{t) ^ Trj{t + l)|y(t) - Xj{t)\ + a^{f){n — (t))L^. 
i=i 
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It is easy to see that at > 0 and c* > 0 for each t. Since Qj is the optimal value of function gj{-), it 
holds that 6* > 0 for each t. Thus, {at}^o> {qI^o three non-negative sequences. 

By (30), it holds that 

CLt+i < at — bt + ct for each t > 0. 


By Lemma 8, it holds that 


oo n—4 


4L E“(‘)E +1 ) \yit) - Xjit)\ < oo. 

t=0 j=l 


In addition, since holds that 


(n — (l))L^ a‘^{t) < oo. 


t=o 


Thus, we get 


OO OO 


^ 4La(t) + 1) \y{t) - + a‘^{t){n - (t))L^ 

t=o t=o \ j=l 


4L^ a(t)^7rj(t + l)ly(t) - Xjit)\ | + {n - 
t=o \ j=l 


t=o 


< oo. 


(31) 


Therefore, applying Lemma 4 to the sequences {6t}“Q and {cfj^Q, we have that for any 

x' G X, at = \y{t) — x'\ converges, and 

OO OO n — (f) 

= ^a(t)^7rj(t-hl)(5j(y(t))-ff*) < oo. (32) 

t=0 t=0 j=l 

Since \y{t) — x'\ converges for any fixed x' G X, by definition of sequence convergence and the 
dynamic of y(t) in (12), it is easy to see that y(t) also converges. Let lim 4 _>.cxD y(t) = y. Next we 
show that y G X. 


By continuity of h{-), we have 

lim h {y{t)) = h ( lim y{t)) = h{y). 

t^oo \r—>-oo / 

Equivalently, for any e > 0, there exists T such that for any t >T, it holds that 

\h{y{t)) - h{y)\ < e. 

Suppose y ^ X, then h{y) — h*>0. Let cq = ^ ■ Then there exists Tq such that for any t > Tq, 

it holds that 


\h{y{t)) - Ky)\ < eo- 


(33) 
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Let Tt+i C V — be the set of indices such that for each j € It+i-, T^jit + 1) > As G{V,£) 
satisfies Condition 2, \It+i\ > ra.SLx{k', f + 1}. Since gj{y{t)) - g* > 0 for all j, then 


where 


n—(j) 


'^TTj{t + l) {gj{y{t)) 
i=i 


S'!) - idjivit)) - 9j) 

iGXt+i 

idjiyit)) - g*) 

ieit+i 

k 

= Y Y ihiivit)) - K) 

j&Xt+i i=l 

= '5''E( E A.d (/!,(!,(*))-ft*: 

i=l \jGlt+i / 

> k(3^C2 {h{y{t)) - h *), 


C 2 


min > A jo, 

2 ;CV: |X|>max{fc',/+l} ^ 


(34) 


and the last inequality follows from the fact that hi{y{t)) — h* > 0. In addition, as sp{A) = k', 
then ^ ^ every X C V : |X| > max{A:', / + 1}. Since A is hnite, C 2 is well-defined and 

C 2 > 0. The relation (34) can be further bounded as follows. 


00 n—(f) 00 

Y Y idjivit)) -9j) >Y oiit)kl3''C2 {h{y{t)) - h*) 

t=0 j=l t=0 

00 

— Y1 {h{y{t)) — h*) as h{y{t)) — h* > 0, Vt 

t=To 

00 

> ^ a{t)k(3‘'C2 {h{y) - h* - cq) by (33) 

t=Tf) 

= ^ a(t)/c/3^C'2eo since cq = 
t=To 

00 

= 00 since = 00 . 

t=o 

This contradicts the fact that (32) holds. Thus, the assumption that y ^ X does not hold, and 
yex. 


Therefore, we conclude that y G X. That is, there exists x* G X such that y = x* and 

lim \y{t) — a:*| =0. 

t^OO 


By triangle inequality, we have 


Xi{t) - a:*| < |xi(t) - y{t) \ + \y{t) - x* 


(35) 
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Then, by Lemma 7 and (35), we have 

limsup \xi{t) — x*\ < lim \xi{t) — y{t)\ + lim \y{t) — x*\ = 0. 

^ yOQ t —^OO t —^OO 

On the other hand, liminfj_>,oo \xi{t) — x*\ >0. Thus, limit of |xj(t) — x*| exists and 

lim \xi{t) — x*| =0, 

t—>-oo 

proving Theorem 4. 

□ 


